In silico prediction of splice-altering single nucleotide variants in the human genome

نویسندگان

  • Xueqiu Jian
  • Eric Boerwinkle
  • Xiaoming Liu
چکیده

In silico tools have been developed to predict variants that may have an impact on pre-mRNA splicing. The major limitation of the application of these tools to basic research and clinical practice is the difficulty in interpreting the output. Most tools only predict potential splice sites given a DNA sequence without measuring splicing signal changes caused by a variant. Another limitation is the lack of large-scale evaluation studies of these tools. We compared eight in silico tools on 2959 single nucleotide variants within splicing consensus regions (scSNVs) using receiver operating characteristic analysis. The Position Weight Matrix model and MaxEntScan outperformed other methods. Two ensemble learning methods, adaptive boosting and random forests, were used to construct models that take advantage of individual methods. Both models further improved prediction, with outputs of directly interpretable prediction scores. We applied our ensemble scores to scSNVs from the Catalogue of Somatic Mutations in Cancer database. Analysis showed that predicted splice-altering scSNVs are enriched in recurrent scSNVs and known cancer genes. We pre-computed our ensemble scores for all potential scSNVs across the human genome, providing a whole genome level resource for identifying splice-altering scSNVs discovered from large-scale sequencing studies.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

In-silico study to identify the pathogenic single nucleotide polymorphisms in the coding region of CDKN2A gene

Background: CDKN2A, encoding two important tumor suppressor proteins p16 and p14, is a tumor suppressor gene. Mutations in this gene and subsequently the defect in p16 and p14 proteins lead to the downregulation of RB1/p53 and cancer malignancy. To identify the structural and functional effects of mutations, various powerful bioinformatics tools are available. The aim of this study is the ident...

متن کامل

تأثیر آدنوزین ´5تری فسفات در القای آپوپتوز و مهار بیان ژن Survivin و واریانت پیرایشی ضد آپوپتوزی SUR-3B آن در سلول های K562

Introduction: Leukemia is a heterogeneous malignant disease in which progression at the level of CD34+ cells has a major impact in drug resistance and relapse. The multi-drug resistance gene product, P-glycoprotein is an inhibitor of apoptosis proteins (IAPs), such as Survivin that are expressed simultaneously with several putative drug resistance parameters in CD34+ leukemia cells. In fact, IA...

متن کامل

In silico analysis for determining the deleterious nonsynonymous single nucleotide polymorphisms of BRCA genes

Recent advances in DNA sequencing techniques have led to an increase in the identification of single nucleotide polymorphisms (SNPs) in BRCA1 and BRCA2 genes, but no further information regarding the deleterious probability of many of them is available (Variants of Unknown Significance/VUS). As a result, in the current study, different sequence- and structure-based computation...

متن کامل

The miR526b-5p-Related Single Nucleotide Polymorphisms, rs72618599, Located in 3\'-UTR of TCF3 Gene, is Associated with the Risk of Breast and Gastric Cancers

Introduction: Single nucleotide polymorphisms result in dysregulation of the proto-oncogene TCF3 gene, which is associated with the development, metastasis, and chemoresistance of different malignancies. Methods: GSE10810 microarray dataset and GEPIA2 online software were used to find differentially expressed genes and the TCF3 status in breast cancer (BC) and gastric cancer (GC), respectively....

متن کامل

Interpreting functional effects of coding variants: challenges in proteome-scale prediction, annotation and assessment

Accurate assessment of genetic variation in human DNA sequencing studies remains a nontrivial challenge in clinical genomics and genome informatics. Ascribing functional roles and/or clinical significances to single nucleotide variants identified from a next-generation sequencing study is an important step in genome interpretation. Experimental characterization of all the observed functional va...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 42  شماره 

صفحات  -

تاریخ انتشار 2014